6 research outputs found

    NoC-based Architectures for Real-Time Applications : Performance Analysis and Design Space Exploration

    Get PDF
    Monoprocessor architectures have reached their limits in regard to the computing power they offer vs the needs of modern systems. Although multicore architectures partially mitigate this limitation and are commonly used nowadays, they usually rely on intrinsically non-scalable buses to interconnect the cores. The manycore paradigm was proposed to tackle the scalability issue of bus-based multicore processors. It can scale up to hundreds of processing elements (PEs) on a single chip, by organizing them into computing tiles (holding one or several PEs). Intercore communication is usually done using a Network-on-Chip (NoC) that consists of interconnected onchip routers allowing communication between tiles. However, manycore architectures raise numerous challenges, particularly for real-time applications. First, NoC-based communication tends to generate complex blocking patterns when congestion occurs, which complicates the analysis, since computing accurate worst-case delays becomes difficult. Second, running many applications on large Systems-on-Chip such as manycore architectures makes system design particularly crucial and complex. On one hand, it complicates Design Space Exploration, as it multiplies the implementation alternatives that will guarantee the desired functionalities. On the other hand, once a hardware architecture is chosen, mapping the tasks of all applications on the platform is a hard problem, and finding an optimal solution in a reasonable amount of time is not always possible. Therefore, our first contributions address the need for computing tight worst-case delay bounds in wormhole NoCs. We first propose a buffer-aware worst-case timing analysis (BATA) to derive upper bounds on the worst-case end-to-end delays of constant-bit rate data flows transmitted over a NoC on a manycore architecture. We then extend BATA to cover a wider range of traffic types, including bursty traffic flows, and heterogeneous architectures. The introduced method is called G-BATA for Graph-based BATA. In addition to covering a wider range of assumptions, G-BATA improves the computation time; thus increases the scalability of the method. In a second part, we develop a method addressing design and mapping for applications with real-time constraints on manycore platforms. It combines model-based engineering tools (TTool) and simulation with our analytical verification technique (G-BATA) and tools (WoPANets) to provide an efficient design space exploration framework. Finally, we validate our contributions on (a) a serie of experiments on a physical platform and (b) two case studies taken from the real world: an autonomous vehicle control application, and a 5G signal decoder applicatio

    Buffer-Aware Worst-Case Timing Analysis of Wormhole NoCs Using Network Calculus

    Get PDF
    Abstract—Conducting worst-case timing analyses for wormhole Networks-on-chip (NoCs) is a fundamental aspect to guarantee real-time requirements, but it is known to be a challenging issue due to complex congestion patterns that can occur. In that respect, we introduce in this paper a new buffer-aware timing analysis of wormhole NoCs based on Network Calculus. Our main idea consists in considering the flows serialization phenomena along the path of a flow of interest (f.o.i), by paying the bursts of interfering flows only at the first convergence point, and refining the interference patterns for the f.o.i accounting for the limited buffer size. Moreover, we aim to handle such an issue for wormhole NoCs, implementing a fixed priority-preemptive arbitration of Virtual Channels (VCs), that can be assigned to an arbitrary number of traffic classes with different priority levels, i.e. VC sharing, and each traffic class may contain an arbitrary number of flows, i.e. priority sharing. It is worth noting that such characteristics cover a large panel of wormhole NoCs. The derived delay bounds are analyzed and compared to available results of existing approaches, based on Scheduling Theory as well as Compositional Performance Analysis (CPA). In doing this, we highlight a noticeable enhancement of the delay bounds tightness in comparison to CPA approach, and the inherent safe bounds of our proposal in comparison to Scheduling Theory approaches. Finally, we perform experiments on a manycore platform, to confront our timing analysis predictions to experimental data and assess its tightness

    Work-in-Progress: Extending Buffer-Aware Worst-Case Timing Analysis of Wormhole NoCs

    Get PDF
    Worst-case timing analysis of Networks-on-Chip (NoCs) is a crucial aspect to design safe real-time systems based on manycore architectures. In this paper, we present some potential extensions of our previously-published buffer-aware worst-case timing analysis approach to cope with bursty traffic such as real-time audio and video streams. A first promising lead is to improve the algorithm analyzing backpressure patterns to capture consecutive-packet queueing effect while keeping the information about the dependencies between flows. Furthermore, the improved algorithm may also decrease the inherent complexity of computing the indirect blocking latency due to backpressure

    Graph-based Approach for Buffer-aware Timing Analysis of Heterogeneous Wormhole NoCs under Bursty Traffic

    Get PDF
    This paper addresses the problem of worst-case timing analysis of heterogeneous wormhole NoCs, i.e., routers with different buffer sizes and transmission speeds, when consecutive-packet queuing (CPQ) occurs. The latter means that there are several consecutive packets of one flow queuing in the network. This scenario happens in the case of bursty traffic but also for non-schedulable traffic. Conducting such an analysis is known to be a challenging issue due to the sophisticated congestion patterns when enabling backpressure mechanisms. We tackle this problem through extending the applicability domain of our previous work for computing maximum delay bounds using Network Calculus, called Buffer-aware worst-case Timing Analysis (BATA). We propose a new Graph-based approach to improve the analysis of indirect blocking due to backpressure, while capturing the CPQ effect and keeping the information about dependencies between flows. Furthermore, the introduced approach improves the computation of indirect-blocking delay bounds in terms of complexity and ensures the safety of these bounds even for nonschedulable traffic. We provide further insights into the tightness and complexity issues of worst-case delay bounds yielded by the extended BATA with the Graph-based approach, denoted G-BATA. Our assessments show that the complexity has decreased by up to 100 times while offering an average tightness ratio of 71%, with reference to the basic BATA. Finally, we evaluate the yielded improvements with G-BATA for a realistic use case against a recent state-of-the-art approach. This evaluation shows the applicability of GBATA under more general assumptions and the impact of such a feature on the tightness and computation tim

    Tightness and Computation Assessment of Worst-Case Delay Bounds in Wormhole Networks-On-Chip

    Get PDF
    This paper addresses the problem of worst-case timing analysis in wormhole Networks-On-Chip (NoCs). We consider our previous work [5] for computing maximum delay bounds using Network Calculus, called the Buffer-Aware Worst-case Timing Analysis (BATA). The latter allows the computation of delay bounds for a large panel of wormhole NoCs, e.g., handling priority-sharing, Virtual Channel Sharing and buffer backpressure.In this paper, we provide further insights into the tightness and computation issues of the worst-case delay bounds yielded by BATA. Our assessment shows that the gap between the computed delay bounds and the worst-case simulation results is reasonably small (70% tightness on average). Furthermore, BATA provides good delay bounds for medium-scale configurations within less than one hour. Finally, we evaluate the yielded improvements with BATA for a realistic use-case against a recent state-of-the-art approach. This evaluation shows the applicability of BATA under more general assumptions and the impact of such a feature on the tightness and computation time

    Platerformes pluri-coeurs avec réseau sur puce pour les applications temps réel : Analyse de performance et exploration d'architectures

    No full text
    Les architectures mono-processeur montrent leurs limites en termes de puissance de calcul face aux besoins des systèmes actuels. Bien que les architectures multi-cœurs résolvent partiellement ce problème, elles utilisent en général des bus pour interconnecter les cœurs, et cette solution ne passe pas à l'échelle. Les architectures dites pluri-cœurs ont été proposées pour palier les limitations des processeurs multi-cœurs. Elles peuvent réunir jusqu'à des centaines de cœurs sur une seule puce, organisés en dalles contenant une ou plusieurs entités de calcul. La communication entre les cœurs se fait généralement au moyen d'un réseau sur puce constitué de routeurs reliés les uns aux autres et permettant les échanges de données entre dalles. Cependant, ces architectures posent de nombreux défis, en particulier pour les applications temps-réel. D'une part, la communication via un réseau sur puce provoque des scénarios de blocage entre flux, ce qui complique l'analyse puisqu'il devient difficile de déterminer le pire cas. D'autre part, exécuter de nombreuses applications sur des systèmes sur puce de grande taille comme des architectures pluri-cœurs rend la conception de tels systèmes particulièrement complexe. Premièrement, cela multiplie les possibilités d'implémentation qui respectent les contraintes fonctionnelles, et l'exploration d'architecture résultante est plus longue. Deuxièmement, une fois une architecture matérielle choisie, décider de l'attribution de chaque tâche des applications à exécuter aux différents cœurs est un problème difficile, à tel point que trouver une une solution optimale en un temps raisonnable n'est pas toujours possible. Ainsi, nos premières contributions s'intéressent à cette nécessité de pouvoir calculer des bornes fiables sur le pire cas des latences de transmission des flux de données empruntant des réseaux sur puce dits "wormhole". Nous proposons un modèle analytique, BATA, prenant en compte la taille des mémoires tampon des routeurs et applicable à une configuration de flux de données périodiques générant un paquet à la fois. Nous étendons ensuite le domaine d'applicabilité de BATA pour couvrir un modèle de traffic plus général ainsi que des architectures hétérogènes. Cette nouvelle méthode, appelée G-BATA, est basée sur une structure de graphe pour capturer les interférences possibles entre flux de données. Elle permet également de diminuer le temps de calcul de l'analyse, améliorant la capacité de l'approche à passer à l'échelle. Dans une seconde partie, nous proposons une méthode pour la conception d'applications temps-réel s'exécutant sur des plateformes pluri-cœurs. Cette méthode intègre notre modèle d'analyse G-BATA dans un processus de conception systématique, faisant en outre intervenir un outil de modélisation et de simulation de systèmes reposant sur des concepts d'ingénierie dirigée par les modèles, TTool, et un logiciel pour l'analyse de performance pire-cas des réseaux, WoPANets. Enfin, nous proposons une validation de nos contributions grâce à (a) une série d'expériences sur une plateforme physique et (b) deux études de cas d'applications réelle; le système de contrôle d'un véhicule autonome et une application de décodeur 5G.Monoprocessor architectures have reached their limits in regard to the computing power they offer vs the needs of modern systems. Although multicore architectures partially mitigate this limitation and are commonly used nowadays, they usually rely on intrinsically non-scalable buses to interconnect the cores. The manycore paradigm was proposed to tackle the scalability issue of bus-based multicore processors. It can scale up to hundreds of processing elements (PEs) on a single chip, by organizing them into computing tiles (holding one or several PEs). Intercore communication is usually done using a Network-on-Chip (NoC) that consists of interconnected onchip routers allowing communication between tiles. However, manycore architectures raise numerous challenges, particularly for real-time applications. First, NoC-based communication tends to generate complex blocking patterns when congestion occurs, which complicates the analysis, since computing accurate worst-case delays becomes difficult. Second, running many applications on large Systems-on-Chip such as manycore architectures makes system design particularly crucial and complex. On one hand, it complicates Design Space Exploration, as it multiplies the implementation alternatives that will guarantee the desired functionalities. On the other hand, once a hardware architecture is chosen, mapping the tasks of all applications on the platform is a hard problem, and finding an optimal solution in a reasonable amount of time is not always possible. Therefore, our first contributions address the need for computing tight worst-case delay bounds in wormhole NoCs. We first propose a buffer-aware worst-case timing analysis (BATA) to derive upper bounds on the worst-case end-to-end delays of constant-bit rate data flows transmitted over a NoC on a manycore architecture. We then extend BATA to cover a wider range of traffic types, including bursty traffic flows, and heterogeneous architectures. The introduced method is called G-BATA for Graph-based BATA. In addition to covering a wider range of assumptions, G-BATA improves the computation time; thus increases the scalability of the method. In a second part, we develop a method addressing design and mapping for applications with real-time constraints on manycore platforms. It combines model-based engineering tools (TTool) and simulation with our analytical verification technique (G-BATA) and tools (WoPANets) to provide an efficient design space exploration framework. Finally, we validate our contributions on (a) a serie of experiments on a physical platform and (b) two case studies taken from the real world: an autonomous vehicle control application, and a 5G signal decoder applicatio
    corecore